Take-home project 1

Open In Colab

Write your PSU email address here:

Share the notebook with aun1@psu.edu

Load the data

import pandas as pd

variants = pd.read_csv(
    "https://raw.githubusercontent.com/nekrut/bda/main/data/pf_variants.tsv",
    sep="\t"
)

variants.head()

Instructions

Our goal is to understand whether the malaria parasite (Plasmodium falciparum) infecting these individuals is resistant to Pyrimethamine—an antimalarial drug. Resistance to Pyrimethamine is conferred by a mutation in PF3D7_0417200 (dhfr) gene Cowman1988. Given sequencing data from four individuals we will determine which one of them is infected with a Plasmodium falciparum carrying mutations in this gene.

Variant calls in the provided Pandas data frame represent analysis of four samples: two from Ivory Coast and two from Colombia:

Accession	Location
ERR636434	Ivory coast
ERR636028	Ivory coast
ERR042232	Colombia
ERR042228	Colombia

These accessions correspond to datasets stored in the Sequence Read Archive at NCBI.

(data from MalariaGen )

Specifics

Filter variants falling within the dhfr gene
Restrict variants to missense variants only using the effect column.
You are specifically interested in variant at amino acid position 108
Create a graph that shows samples vs variant coordinates, in which graph marks are proportional to alternative allele frequencies (AF column)
Create a graph showing a world map in which allele frequencies of these two samples are represented as pie charts within the map of Colombia and within the map of Ivory Coast. to be more specific, for each location you have two samples. Each of these samples will have an allele frequency at the resistance side. Use these allele frequencies as areas on the pie chart

You can use any AI you want (preferably the one integrated in Colab) but you will never get exactly what you want, so you will have to adjust it. You will have to explain to me what the steps were.